CONCENTRATION OF HAAR MEASURES, WITH AN 
APPLICATION TO RANDOM MATRICES 
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Abstract. We show that the mixing times of random walks on com- 
pact groups can be used to obtain concentration inequahties for the 
respective Haar measures. As an apphcation, we derive a concentration 
inequahty for the empirical distribution of eigenvalues of sums of ran- 
dom hermitian matrices, with possible applications in free probability. 
The advantage over existing techniques is that the new method can deal 
with functions that are non-Lipschitz or even discontinuous with respect 
to the usual metrics. 



1. Introduction and results 

Much attention has been paid to the derivation of concentration inequal- 
ities through logarithmic Sobolev inequalities, semigroup or transportation 
methods. Let us refer to the monograph of Ledoux [9j for an extensive sur- 
vey. On the other hand, starting with the pioneering work of Diaconis and 
Saloff-Coste [3] E] (see also [H]), it is known that the rate of convergence 
to equilibrium of certain ergodic Markov semigroups or of random walks 
on compact groups involve logarithmic Sobolev constants. In this paper we 
make an explicit connection between the mixing time of a random walk on 
a compact group and the concentration property of the Haar measure. In 
other words the rate of convergence to equilibrium and the rate of concen- 
tration are directly connected. This is made precise in Theorem 11.21 below. 

We demonstrate that this new approach to concentration can be succes- 
fully applied to random matrices. The gain as compared to previous methods 
is that it allows to deal with functions that are possibly non-Lipschitz with 
respect to the Hilbert-Schmidt norm. 

The new method can be called an extension of Stein's method of ex- 
changeable pairs [17J, as developed by the author in his Ph.D. thesis [2j. 
From a different angle, it can also be viewed as a discrete analog of the 
semigroup tool for measure concentration (see Ledoux pj, section on 'semi- 
group tools'; see also the discussion in Subsection 11.31 of this paper). Some 
other applications of our method can be found in ^ . 
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The paper is organized as follows. We begin with the random matrix 
example (Theorem II. ip , followed by the statement of the main result (The- 
orem [L2]), and a sketch of the proofs (Subsection II. 3p . Section [T] ends with 
a brief discussion of the literature. Proofs of Theorems II . 2 1 and II . II are given 
in Section [2l 

1.1. A random matrix example. Let M be an n x n complex hermitian 
(i.e. self-adjoint) matrix. The following terminology is standard. 

(1) The empirical spectral measure of M is the probability measure on M, 
denoted by fj.M, which puts 1/n on each eigenvalue of M, repeated 
by multiplicities. 

(2) The empirical distribution function of M, denoted by Fm, is the 
cumulative probability distribution function corresponding to the 
empirical spectral measure. 

(3) Any hermitian matrix that has the same spectrum as M can be writ- 
ten as UMU* for some unitary matrix U. Thus, the Haar measure 
on the group of unitary matrices of order n naturally induces a 'uni- 
form distribution' on the set of all hermitian matrices with the same 
spectrum as M. We denote this probability measure by pM- 

Theorem 1.1. Let M and N be two hermitian matrices of order n. Suppose 
A ~ pm and B ^ are two independent random hermitian matrices. Let 
H = A + B . Then, for every x € M, Var(i^j^(x)) < Kn~^ logn, where k is a 
universal constant not depending on n, M, N or x. Moreover, we also have 
the concentration inequality 



for every t > 0, where k is the same constant as in the variance bound. 

A remarkable aspect of Theorem 1 1.1 1 is that the constant k is just a numerical 
constant independent of everything else. Note also that H i— > Fh{x) is a 
discontinuous map. We believe that such a result cannot be established via 
gaussian type concentration of measure for orthogonal and unitary matrices 
(Gromov &; Milman [8] and Szarek [18]). 

Incidentally, Voiculescu used the results of Gromov- Milman and Szarek in 
his celebrated work pOj that connected free probability theory with random 
matrices. That is an example of an area where concentration results such 
as Theorem 11.11 may be relevant. 

1.2. The main result. Let G be a compact topological group. Then there 
exists a G- valued random variable X with the properties that for any x G G, 
the random variables xX, Xx and X~^ all have the same distribution as 
X. The distribution of X is called the (normalized) Haar measure on G. 
The existence and uniqueness of the Haar measure is a classical result (see 
e.g. Rudin jil5j. Theorem 5.14). Let Y be another G-valued random variable 
with the following properties: 
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(1) The random variable has the same distribution as Y; that is, 
the law of Y is symmetric. 

(2) For any x ^ G, xYx~^ has the same distribution as Y. In other 
words, the distribution of Y is 'constant on the conjugacy classes 
of G'. 

Recall that for two random variables U and V taking value in some separable 
space X, the supremum of \¥{U G B) — ¥{V £ B)\ as B ranges over all Borel 
subsets of X is called the total variation distance between the laws of U and 
V, often denoted simply by drviU, V). 

Theorem 1.2. Let G,X,Y be as above, with X and Y independent. Let 
f : G ^ be a bounded measurable function such that Ef{X) = 0. Let 
ll/IU = sup^^G\f{x)\ and 

||/||y:=sup[E(/(x)-/(Fx))2]'/^ 

Let Yi,Y2,..., be i.i.d. copies ofY. Suppose a and b are two positive con- 
stants such that dTv{YiY2 ■ ■ -Y^jX) < ae"^^ for every k, where dxv is the 
total variation metric. Let A and B be two numbers such that \\f\\oo < A 
and < B. Let 



B 



2 



log 



4a A Y 



1 - e-^_ 

Then Var(/(X)) < C/2, and for any t > 0, F{\f(X)\ > t} < 2e-*'/^. 

The main term in the bound is B^ /b; the term within the brackets will 
always contribute just a 'factor of logn' in applications (see discussion in 
the next subsection). 

Recall that if ae~** expresses the correct rate of decay of the total variation 
distance, then r := b^^ log a is the mixing time of the Markov chain. Thus, 
the theorem roughly says the following: the deviation of f{X) from its mean 
is of the order of B^, where S is a bound on the size of f{x) — f{Yx), and 
T is the mixing time of the Markov chain induced by Y. 

1.3. Outline of the proofs. Given a reversible Markov kernel P and a 
function / on the state space, the function 

oo 

(1) F{x,y):=Y,iP'fi^)-P'fiy)) 

k=0 

has the properties that F{x,y) = —F{y,x), and 

E{F{Xo,Xi)\Xo) = f{Xo) - Ef{Xo), 

where Xo,Xi, . . . is a stationary Markov chain from the kernel P. Using 
these two properties and some intuition from Stein's method, we show that 

(2) Var(/(Xo)) = ^E((/(Xo) - f{X,))F{Xo,X,)). 
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For a continuous Markov semigroup {Pt)t>o with unique invariant measure 
/i, the above identity is easily seen to be equivalent to 



where £ is the Dirichlet form corresponding to the pair {{Pt)t>o, fJ-)- Iden- 
tities like this form the basis of the semigroup method for measure concen- 
tration. 

Now suppose we can produce a number B such that for all k, and all x, y 
such that y can be reached from x in one step of the chain (i.e. x and y are 
'neighbors'), we have 



nentially fast. Combining, we see from the definition ([T]) of F that 



This is the essence of the variance bound in Theorem 1 1.2 1 The concentration 
inequality is obtained along a similar line. 

In practice, an inequality like ^ is not very easy to establish. In fact, in 
the proof of Theorem 11.21 we are only able to prove ([3]) in an average sense. 
The 'constant on conjugacy classes' condition imposed on the random walk 
is required for our proof of ([3]) . The key idea is to construct a coupling such 
that if two chains are started at neighboring sites, they continue to be on 
neighboring sites at each step. 

The log factor. The log factor in Theorem 11.21 arises from the log n terms 
appearing in the mixing times of random walks. In the above sketch, we 
used the fact that P^f{x) — f{y) vanishes rapidly beyond k > t. Now, 
if x and y are neighboring states, this vanishing probably happens quicker, 
typically in n steps instead of nlogn. But this is not stated in the standard 
theorems on Markov chain mixing. Any result in this direction (e.g. via 
path coupling) will suffice to remove the log factor from the statement of 
Theorem 11.21 

The proof of Theorem 1 1.1 1 is a direct application of Theorem 1 1.21 proceed- 
ing as follows. First, we fix x S M and two matrices M and N as in the 
statement of Theorem 11.11 It is not difficult to see that the law of Fh{x) 
is the same as that of Fumu*+n{x), where C/ is a Haar distributed unitary 
matrix. Accordingly, the state space is taken to be the set of n x n unitary 
matrices lin, and the function / is defined as 





f{y) vanishes expo- 




f{U) = Fumu*+n{x). 
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We consider a random walk on lin generated by conjugation with certain 
random reflections. The total variation rate of convergence to equilibrium 
for this walk is directly available from the literature |14j . 

1.4. Discussion of existing literature. There is not much literature on 
the concentration of Haar measures. One early result is due to Maurey [llj, 
who investigated the Haar measure on the group Sn of permutations of n 
elements. The setting in Maurey 's theorem is a particular case of ours, with 
Y being a random transposition of two elements. 

Maurey's result was generalized in the lecture notes of Milman and Schecht- 
man ([12j, Theorem 7.12) using a martingale argument. Talagrand, in his 
famous treatment [19], made a substantial improvement on Maurey's result 
that allows one to go beyond 'bounded differences'. The recent paper of 
Luczak and McDiarmid [10] is also worthy of note. 

The other group that has been studied for concentration of measure is the 
special orthogonal group SOn, i.e., the group of n x n orthogonal matrices 
with determinant 1. The chief result about the concentration of Haar mea- 
sure on this group is due to Gromov &: Milman As mentioned before, 
this result was used by Voiculescu [20] is his work connecting random matrix 
theory with free probability. 

However, other than the results about Sn and SOn mentioned above, there 
is very little of general theory about the concentration of Haar measures. 
Theorem 11.21 is possibly the first result of its kind, and also the first result 
that connects rates of convergence to stationarity of random walks on groups 
with concentration of the invariant measures. Random walks on groups have 
received extensive attention following the pioneering works of Diaconis and 
Shahshahani [6] and Diaconis and Saloff-Coste [5]. Theorem 11.21 allows us 
to translate results about the rate of convergence to stationarity of random 
walks on groups which are 'constant on conjugacy classes' to concentration 
inequalities under the Haar measure. Indeed, we will use one such available 
result [H] to obtain the concentration of the Haar measure on the group 
of unitary matrices of order n with respect to the rank distance for n x n 
matrices (the rank distance is defined as d{A, B) := rank(A — B)). 

Finally, let us clarify that the 'concentration property of groups' as defined 
by Gromov & Milman [8] and investigated by Pestov (see, e.g. [13]) is not 
related to the sort of things that we are investigating. 

2. Proofs 

2.1. Proof of Theorem ll.2l We begin with the observation that y defines 
a reversible Markov kernel P in a natural way: For any / : G — > R such that 
E|/(X)| < oo, let 

(5) Pf{x) := Ef{Yx) = Ef{xx-^Yx) = Ef{xY). 

The reversibility of this kernel can be proved as follows: Since yX has the 
same distribution as X for any y G G, and X, Y are independent, therefore 
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Y and YX are also independent. Also, Y ^ has the same distribution 
as Y. Hence, the pair {X,Y) has the same distribution as (YX,Y~^). 
Consequently, the pairs {X,YX) and {YX,Y-^YX) = {YX,X) also have 
the same distribution. In other words, {X, YX) is an exchangeable pair of 
random variables. This is equivalent to saying that P is a reversible Markov 
kernel. The following lemma gives the most important information about 
this kernel that we require. 

Lemma 2.1. Under the hypothesis of Theorem \1.2\ and with P defined 
in we have 



k=0 



< 



S2 



4aA\ 



f{Yx){P''f{x)-P>'f{Yx))\ 
b 



+ 



1 



Proof. Note that for any x G G, 

\P'f{x)\ = \P''f{x)-Ef{X)\ = \Ef{Y, 
<2\\f\\^dTv{Yi---Yk,X)<2 
This shows, in particular, that for any x £ G, we have 

^ 2||flUa 

(6) 



Ykx) - Ef{Xx) 

bk 



ae 



Ei^'/(x)i<^ 



< oo. 



k=0 



-bk 



More importantly, it gives the bound 

E|(/(x) - /(yx))(PV(x) - P'fiYx))] 

< 4||/|Uae-^'lE|/(x) - fiYx)\ < 4||/|Uae- 

Now recall the assumption 2 that for any y £ G, y^^Yy has the same 
distribution as Y . Thus, for any x,y £ G, 

Pfiyx) = Ef{Yyx) = W.f{yy-^Yyx) = Ef{yYx). 

So, if we let Y' be an independent copy of Y , then 

E(P/(x) - Pf{Yx)f = E(E(/(y'x) - f{YY'x)\Yf) 

< E(/(y'x) - f{YY'x)f 

< supE(/(y'x)-/(yy'x))2 = ||/||f.. 

Thus, ||P/||y < ||/||y. Continuing by induction, we get \\P^f\\Y < ||/||y. 
This gives 

E|(/(x) - /(yx)(pV(x) - 

< (E(/(x) - /(yx))2)^/'(E(PV(x) - P'fiYx))^' 



(8) 



< 



\P'f\\Y < 
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Using ([7]) and we get 

E\{f{x) - f{Yx){P''f{x) - P'^f{Yx))\ 

oo 

<^mm{||/||2.,4a||/|U||/||ye-''n 

fe=0 

oo oo 

< ^min{5^4aA5e-''^} = B'^J2^jim{l,AaAB-^e-''''}. 

k=0 k=0 

We shall now compute a bound on the above sum. For ease of notation let 
P = AaAB~^, and let 7 = b"^ log/3. If /3 < 1, the sum is just a geometric 
series which is easy to evaluate. Now assume (3 > 1. Then 7 is nonnegative. 
Now, an easy verification shows that f5e~^ = 1, and 1 > (3e~^'' if and only 
if > 7. Let ko be the integer such that ko — 1 < ^ < kQ. Then 



00 



^ Qp-bko 

^min{l,/3e-^'=} < ko + /Je""^ = ko + f 

k=0 k>ko 

Now the function 



g : X ^ X + 



1 - 

is convex and is therefore upper bounded by max{g{'y), g{'y + 1)} on the 
interval [7,7 + !]. A simple verification now shows that 

5(7) = 5(7 + 1) = 7 + -I ^ _b - 

I — e ° 

This completes the proof of the lemma. □ 

Now define the function F : ^ M as 

00 

F(xi,X2) :=Y,iP'fi^i)-P'fi^2)), 
k=0 

where / is the function under consideration in Theorem 11.21 By ^ , the 
sum converges everywhere. The following lemma establishes the relevant 
properties of F. 

Lemma 2.2. The function F satisfies F{xi,X2) = —F{x2,xi) 

E{F{x,Yx)) = f{x). 

Proof. The first property is obvious. Now, E,{P^f{Yx)) = P^'^^ f{x). Thus, 
for any N , we have 

N 

5^E(pV(x) - P^'fiYx)) = fix) - P^+V(x). 

fe=0 

Now, by ([6]), we have lim^v^oo = 0. The uniform bound in ([6]) 
also allows us to use the dominated convergence theorem. This completes 
the proof of the lemma. □ 
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We are now ready to finish the proof of Theorem 11.21 First, let us define 
the function 

v{x) := E|(/(x) - f{Yx))F{x, Yx)\. 
By Lemma 12.21 and the independence of X and Y, we get 

(9) miXf) = m{X)F{X, YX)). 

Since xa) = -F{x2,xi), we also get E{f{X)^) = -E{f{X)F{YX, X)). 
Now, as proved in the beginning of this section, (X, YX) is an exchangeable 
pair of random variables. Thus, 

(10) E(/(X)2) = -E{f{X)F{YX, X)) = -E{f(YX)F{X, YX)). 
Combining ([9]) and (fTOl) . we get 

E(/(X)2) = ^E{{f{X) - f{YX))F{X,YX)) < \e{v{X)). 

By Lemma [2. 1^ \v{x)\ < C for each x, where C is as defined in the statement 
of Theorem 11.21 This proves the second moment bound. 

For the exponential inequality, let us define (^(0) := E{e^^'^-^^) for each 
G M. Since / is a bounded function, therefore ip is differentiable and 

^'{6) = E{e^f^^^ f{X)) = E{e'^f'~^^ F{X,Y X)). 

Proceeding exactly as before, we get 

^'{ff) = iE((e^^(^) - e^^(^^))F(X,yX)). 
Now, for any u, u G M, we have 

e*"+(^-*>dt < /\te" + (1 - t)e'')dt = -(e" + e''). 

Using this, and the exchangeability of {X,XY) and the symmetry of \F\, 
we get 

y{e)\ < ^E((e^/W + e'^^(^^))|(/(X) - f {Y X))F{X,Y X)\) 

< ME(e^/W|(/(x) -/(yx))F(x,yx)|) 

= MK(e^/(-).(X)) < 

This gives '^{9) < CO"^ /4, for all 9. The proof can now be easily completed 
via routine arguments. 



u — V 



CONCENTRATION OF HAAR MEASURES 



9 



2.2. Proof of Theorem 11.11 Throughout this subsection, will denote 
the group of unitary matrices of order n. To prove Theorem II. H we first 
need to establish a theorem about the concentration of the Haar measure on 
Un- Existing results of the type discussed in Section [T] cannot give concen- 
tration bounds for Ffj, since they are based on the Hilbert-Schmidt distance, 
which is not suitable for this purpose. Instead, we shall work with the rank 
distance, defined as d{M,N) := rank(M — A^). The empirical distribution 
function is well-behaved with respect to this metric, as shown by the follow- 
ing lemma of Bai [T] : 

Lemma 2.3. [Bai [T], Lemma 2.2] Let M and N be two n x n hermitian 
matrices, with empirical distribution functions Fm and F]\f. Then 

\\Fm - FnWoc < -rank(M - N). 
n 

This lemma is an easy consequence of the interlacing inequalities for eigen- 
values of hermitian matrices. (It seems possible that this already existed in 
the literature before [Ij, but we could not find any reference.) 

To find the concentration of the Haar measure on U„ with respect to the 
rank distance, we need a random walk which takes 'small steps' with respect 
to this metric. 

Let G = Un and X be a Haar-distributed random variable on U„. We 
define the r.v. y required for generating the random walk for Theorem 1 1.2 1 as 
follows: Let Y = I — (1 — e^'^)uu* , where u is drawn uniformly from the unit 
sphere in C", and <p is drawn independently from the distribution on [0, 27r) 
with density proportional to {sm{ip/2))'^~^ . Multiplication by Y represents 
a random reflection across a randomly chosen subspace. It is easy to verify 
that Y £ Un. Now, Y~^ = Y* = I - {I - e-^^)uu* = I - (1 - e^^-^''-^'>)uu* 
has the same distribution as Y, since In — ip has the same distribution as 
if. Also, for any [/ e U„, 

UYU* = /-(!- e''P){Uu){Uuy , 

and Uu is again uniformly distributed over the unit sphere in C". Hence Y 
satisfies all the properties required for Theorem 11.21 

Following a sketch of Diaconis & Shahshahani Ursula Porod |14j 
proved the following result about the rate of convergence to stationarity 
of the random walk induced by Y: 

Theorem 2.4. [Porod [H]] Let X,Y be as above. Let Yi,Y2,..., be i.i.d. 
copies of Y . There exists universal constants a,(3,co, such that whenever 
n > 16 and k > ^nlogn + cqu, we have 

(11) dTv{Y, ■■■Yk,X)< an^/2g-/3fc/n^ 

where dj-y denotes the total variation distance. 
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Substituting k = log n + cqu, we get ae'^'^° on the right hand side. 
Thus by suitably increasing a such that ae"^'^" > 1, we can drop the condi- 
tion that k > ^nlogn + co^. Combining Porod's theorem with Theorem ll.2t 
we get the following result about concentration of the Haar measure on lin- 



Proposition 2.5. Let G = lin and X, Y be as above, with n > 16. Let f : 
IXn ^ M 5e a function such that Kf{X) = 0. Let ||/||y = supjj ^y_^[E,{ f{U) — 
/(yC/))2]i/2_ ^ei^ andB be constants such that ||/||oo < A and ||/||y < B. 
Let 



C 



log 



+ 



/?/n 



B J ' l-e-/^/"_ 
where a and [5 are as in (jlip . Then Var(/(X)) < C/2, and for any t > 0, 
we have F{\f{X)\ >t} < 26"*"/^. 



We are now ready to finish the proof of Theorem 11.11 

Proof of Theorem 11.11 Suppose U and V are independent Haar-distributed 
random unitary matrices of order n, and H = UMU* + VNV*. The matrix 
V*HV = V*UMU*V + N has the same spectrum as H. Also, V*U is again 
Haar distributed. Hence, we can consider, without loss of generality, the 
spectrum of 

H = XMX* + N, 

where X follows the Haar distribution on lin- Now let 

H' = {YX)M{YXy + N. 

Recall that Y = I — (1 — e^'^)uu* , where u is drawn from the uniform 
distribution on the unit sphere in C", and is drawn independently from 
the distribution on [0, 27r) with density proportional to (sin(99/2))"^"'^. Let 
5 = 1- e'f. Then 

H-H' = XMX* - (I - 5uu*)XMX*{L - 5uu*) 

= 6Huu* + 6uu*H - \6\'^uu*Huu*. 

The three summands are all of rank 1 and thus rank(i7 — H') < 3. Thus by 
Lemma 12.31 we see that 

(12) \\Fh-Fh'\\oo<-. 

n 

Now fix a point x G M, and let / : lin ^ M be the map which takes X to 
Fh{x). Then by (jl2p . we have 

\f{X)- f{YX)\ < - for ah possible values of X and Y. 

n 

Thus, ||/||y < 3/n. Also, ||/||oo < 1- Thus, in Proposition 12.51 we get 
C < K log n + c for some universal constants k and c. By choosing k large 
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enough, we can drop the assumption that n > 16 and also put c = 0. This 
completes the proof. □ 
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